Enhancing forecast accuracy using combination methods for the hierarchical time series approach

This study aims to investigate whether combining forecasts generated from different models can improve forecast accuracy rather than individual models using the hierarchical time series. Various approaches of hierarchical forecasting have been considered; a bottom-up, top-down, and an optimal combination approach. Autoregressive moving averages (ARIMA) and exponential smoothing (ETS) were used as forecasting models in creating forecasting for all levels in the hierarchy to show the effect of different forecasting methods for each hierarchical model. The results indicated that the Minimum Trace Sample estimator (MinT-Sample) and the bottom-up approaches with the ARIMA model have good predictive performance than other approaches. Moreover, the forecasts from the MinT-Sample and bottom-up approaches were combined using five different combining methods. The experimental results showed that the (AC) method is superior to all other combining methods and more accurate than other individual models at level zero (international total trade in Egypt) and level one (total exports, and total imports). So, combining forecasts generated from different models by hierarchical time series leads to more accurate forecasting of the value of imports and exports which will improve the overall international trade performance, and that is through using the forecasting values of imports and exports to plan for improving the trade balance and drawing up a more efficient production policy. Finally, the study recommends using hierarchical forecasting methods in the areas of international trade, and the Ministry of Commerce and Industry could adopt the results of this study to produce precise forecasts for international trade. Moreover, the results of this study are to be a guide for the researchers to apply these approaches in other fields to improve the performance of forecasting.


Introduction
Hierarchical time series, are multiple time series that are organized hierarchically and can be grouped at many different levels into groups based on geographic location, products, or other features. There are many specialized strategies, such as bottom-up, top-down, or a combination of the two, called "middle-out" and the optimal combination approach, [1]. In [2], a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 Hyndman et al. propose an approach to hierarchical forecasting which provides optimal forecasts that are better than forecasts produced by either a top-down or a bottom-up approach. The method is based on independently forecasting all series at all levels of the hierarchy and then using a regression model to optimally combine and reconcile these forecasts. The resulting revised forecasts add up appropriately across the hierarchy, are unbiased, and have minimum variance amongst all combination forecasts under some simple assumptions. The simulation study shows that the method performs well compared to the top-down approach and the bottom-up method, the proposed method demonstrates by forecasting Australian tourism demand where the data are disaggregated by the purpose of travel and geographical region. In [3], Makoni and Chikobvu presented a paper that aims to model and forecast the Victoria Falls Rainforest tourism demand using hierarchical forecasting methods, the topdown, bottom-up, and optimal combination approaches are adopted. The exponential smoothing techniques (EST) and the autoregressive integrated moving average (ARIMA) methods are the forecasting methods considered. Accuracy measures indicated the bottom-up approach under ARIMA models as the best approach to the data and produced sensible future tourism forecasts. Oliveira and Ramos [4] investigate the relative performance of independent and reconciled forecasting approaches, using real data from a Portuguese retailer, two alternative forecasting model families for generating the base forecasts are considered; namely, state space models and ARIMA. Appropriate models from both families are chosen for each time series by minimizing the bias-corrected Akaike information criteria. The results show significant improvements in forecast accuracy, providing valuable information to support management decisions. And it is clear that reconciled forecasts using the Minimum Trace Shrinkage estimator (MinT-Shrink) generally improve on the accuracy of the ARIMA base forecasts for all levels and for the complete hierarchy, across all forecast horizons. Rehman and et al. [5], show that hierarchical time series arise in manufacturing and service industries when the products or services have a hierarchical structure, and top-down and bottom-up methods are commonly used to forecast the hierarchical time series, one of the critical factors that affect the performance of the two methods is the correlation between the data series, which this study attempts to resolve this problem and shows that the top-down method performs better when data have a high positive correlation compared to a high negative correlation and a combination of forecasting methods may be the best solution when there is no evidence of the correlation ship. The results show that the regression-based, VAR-COV, and Rank-based methods perform better compared to the other methods. Silveira and Azevedo [6], analyzed hourly power generation in Brazil (2018-2020), grouped according to each of the electrical subsystems and their respective sources of generating energy. The objective was to calculate the accuracy of the main measures of aggregating and disaggregating the forecasts of the Autoregressive Integrated Moving Average and Error, Trend, and Seasonal models (ETS). Specifically, the hierarchical approaches were analyzed: bottom-up, top-down, and optimal reconciliation. The optimal reconciliation models showed the best mean performance, considering the primary predictive windows. It was also found that energy forecasts in the South subsystem presented greater inaccuracy compared to the others, which signals the need for individualized models for this subsystem. Makoni and et al. [7], presented a paper whose objectives are as follows: First, to adopt the hierarchical forecasting methods in modeling and forecasting international tourist arrivals in Zimbabwe; Second, to come up with Zimbabwe international tourist arrivals Prediction Intervals (PIs) in Quantile Regression Averaging (QRA) to hierarchical tourism forecasts. Zimbabwe's monthly international tourist arrivals data from January 2002 to December 2018 was used. The data set was disaggregated according to the purpose of the visit. Three hierarchical forecasting approaches, top-down, bottom-up, and optimal combination approaches were applied to the data. The results showed the superiority of the bottom-up approach over both the top-down and optimal combination approaches. Forecasts indicate a general increase in aggregate series. The combined methods provide new insight into modeling tourist arrivals.
This study presents a new main motivation for conducting to investigate whether combining forecasts generated from different hierarchical time series models can improve forecast accuracy rather than using individual models.
In summary, the contributions to the study are: 1. Using hierarchical time series through three different approaches; the bottom-up approach, the top-down approach, and the optimal combination approach 2. Combining the forecasts from the best hierarchical forecasting approaches using five combination methods, the Simple Average method, Geometric Mean, Variance-Covariance method, AKAIKE Weights, and AC method. This study is organized as follows. Section 2 provided a brief description of the methodology. The data is provided in Section 3, and the results are presented in Section 4. Finally, the conclusions are summarized.

Methodology
There are many applications in the field of business and economics that are organized hierarchically and can be grouped at several different levels into groups based on geography, products, or some other features, which are called hierarchical time series such as international trade data. There are common approaches are using for forecasting hierarchical time series; bottom-up, top-down, and Optimal combination approach proposed by Hyndman et al. [2] which has many advantages; Presents point forecasts that are reconciled across the levels of the hierarchy; allows for the interactions and correlations between the series at every level, Presents estimates of forecast uncertainty which are reconciled across the levels; the approach is flexible and provides optimal forecasts under some simple assumptions. The hierarchical forecasting approaches can capture the changes in international trade data and generate accurate forecasts. This section is divided into three parts, the first one briefly introduced the hierarchical approaches used in this study, the second part introduced the forecasting models (ARIMA and ETS) and the third part presented some combining methods.

Hierarchical forecasting approaches
Three approaches are used in this study: bottom-up, top-down, and optimal combination.
2.1.1 The bottom-up approach. One of the most common approaches used for hierarchical forecasting is the bottom-up approach, which requires first providing forecasts for each series at the bottom level, and then aggregating them to provide forecasts for all the levels of the hierarchal structure. The advantage of this approach is that, by modeling the data at the most disaggregated bottom level, no information is lost due to aggregation. Therefore, the dynamics of the individual series can be better captured. However, bottom-level data can be quite noisy, and, therefore, more challenging to model. The hierarchical methods can be represented by the general form: Where S is the m × m k summing matrix, and P is a Matrix of order m k × m, the role of P changes depending on the hierarchical approach. To represent the bottom-up approach using Eq (1), where 0 i×j is the i × j null matrix, the role of P is to extract the bottom-level forecasts, which are subsequently aggregated by the summation matrix S to provide the revised forecasts for the whole hierarchy. For more detail, refer to [6,8].

Top-down approach.
The top-down approach is the other common approach in hierarchical forecasting; the approach disaggregates the forecasts of the total series and distributes these down the hierarchy depending on historical data proportions, to represent the topdown approach using Eq (1), Where p = [p1, p2, . . ., p mk ] 0 is a set of proportions for the bottom-level series, the role of P is to distribute the top-level forecasts to forecasts for the bottom-level series. Different top-down forecasting methods lead to other proportionality vectors p.
In this study, three models of this approach are used [8].

Top-down forecasts based on the average historical proportions (Gross-Sohl method A) (TDGSA)
Each proportion p j reflects the average of the historical proportions of the bottom level series Y j,t over the period t = 1, . . ., n, relative to the total aggregate Y t 2. Top-down forecasts based on the proportion of historical averages (Gross-Sohl method F) (TDGSF) Each proportion p j captures the average historical value of the bottom level series Y j,t relative to the average value of the total aggregate Y t .

Top-down forecasts using forecast proportions (TDFP)
whereŶ ð'Þ j;n ðhÞ is the h-step-ahead forecast andŜ j;n ðhÞ is the sum of the h-step-ahead forecasts below node j which are directly connected to node j. For more detail, refer to [6,8,9].

Optimal combination approach.
Hyndman et al. [2] suggested a new approach to forecasting hierarchical models; the method is based on independently forecasting all series at all levels of the hierarchy and then combining and reconciling these forecasts using a regression model. The base forecasts can be written as follows: Where is the unknown mean of the bottom level K, ε h has zero mean, and covariance matrix Var(ε h ) = S h , then estimate β n (h) by treating Eq (2) as a regression equation and obtain forecasts for all levels of the hierarchy. If S h was known, generalized least squares estimation is used to get the minimum variance unbiased estimate of β n (h) as: Where S þ h is the Moore-Penrose generalized inverse of S h , the revised forecasts is given by: Where which satisfies the unbiasedness property SPS = S. This condition is valid for the bottom-up approach, although not for the top-down approach because of SPS 6 ¼ S. So, the top-down approaches will never give unbiased forecasts, even if the base forecasts are unbiased. The variance of these forecasts is given by: In general, S h is not known and is not identifiable [10]. The residuals from the regression model in Eq (2)  is rank deficient; consequently, S h cannot be identified. The covariance matrix of the h-step ahead reconciled forecast errorsẽ n ðhÞ ¼ Y nþh ÀỸ n ðhÞ is: is the variance-covariance matrix of the h-step ahead base forecast errors and is given by Eq (3). The purpose is to find the matrix P that minimizes the trace of Var ðY nþh À Y n ðhÞ j I n Þ satisfying SPS = S, which gives the best (minimum variance) linear unbiased reconciled forecasts, which refer to this as MinT(minimum trace) reconciliation which computed as Where W h is the covariance matrix of the base forecast errors. Although W h does not suffer from a lack of identification, it is difficult to estimate, especially for h > 1. There are several methods to estimate W h such as [10]: . This method is optimal when the base forecast errors are uncorrelated and equivalent.
In this case, MinT is described as a weighted least squares (WLS) estimator applying variance scaling.
WLSs: W h = k h Λ, 8h, where k h > 0 and Λ = diag(S1) with 1 being a unit column vector of dimension n. MinT(Sample): , is a shrinkage estimator with diagonal target,Ŵ 1;D is a diagonal matrix comprising the diagonal entries ofŴ 1; , and λ D is the shrinkage intensity parameter. For more detail, refer to [2,9,10].

Forecasting models
Two forecasting models were presented briefly in this subsection for generating forecasts for all levels in the hierarchy to show the influence of different forecasting methods for each hierarchical model: ARIMA and ETS models. They are based on different perspectives on the problem and often, but not always, perform differently, although they share some mathematically equivalent models (Oliveira and Ramos [4]).
2.2.1 ARIMA models. ARIMA models are one of the most popular models for forecasting time series, different types of stochastic seasonal and non-seasonal time series can be represented by these models. The general form of seasonal ARIMA models is: Where: ϕ p (β) and θ q (β) are the regular autoregressive and moving average polynomials of orders p and q, respectively, F P (β S ) and θ Q (β S ) are the seasonal autoregressive and moving average polynomials of orders P and Q, respectively, S is the period of seasonality, D is the degree of seasonal differencing is the degree of ordinary differencing, B is backshift operator; ε t is a white noise process with zero mean and variance σ 2 [4]. For more detail, refer to [11].

Exponential smoothing.
Ord et al. [12] extended the work of Snyder [13] by proposing a class of innovation state-space models which considered as underlying some of the exponential smoothing methods. Hyndman et al. [14] and Taylor [15] extended this to include fifteen exponential smoothing methods [16]. These fifteen methods are discriminated based on the nature of the trend and seasonality component observed [17]. Hyndman et al. [18,19] described two possible innovations in state space models for each of the fifteen models, resulting in thirty different models, they have developed an automatic forecasting method using these models, a triplet (E, T, S) was used. E, T, S stands for error, trend, and seasonality components, respectively. The general model involves a state vector x t = (l t , b t , s t , s t−1 , . . ., s t−m+1 ) 0 and state-space equations have where ε t is a Gaussian white noise process with mean zero and variance σ 2 , μ t = w(x t−1 ). There are two state-space models: one model with additive errors and the other with multiplicative errors. The model with additive errors has r(x t−1 ) = 1, so The model with multiplicative errors has r(x t−1 ) = μ t , so: Thus, ε t ¼ y t À m t m t is the relative error for the multiplicative model and any value of r(x t−1 ) will lead to identical point forecast for y t [17]. For more detail, refer to [14,18,19].

Combined forecasts
Combined forecasts were introduced by Bates and Granger [20], and several forecast combination methods have been developed in the literature [21,22]. In this study, five combination methods were used: The Simple Average method, Geometric Mean, Variance-Covariance method, AKAIKE Weights, and AC method.

Simple Average (SA).
In this method, the forecasts are combined by assigning equal weights to each individual forecast. The combination forecast can be expressed as Where f i is the i th single forecast; f c is the combined forecast generated by the n single forecast f i ; n is the total number of individual forecasting models, based on Wong et al. [23,24], and w i is the combination weight assigned to f i which is specified as w i ¼ 1 n .

Geometric Mean (GM).
Suppose the combined forecasts from two forecasting models, the combined forecast using Geometric Mean can be expressed as:  [20]. Due to Shen et al. [25,26], Suppose the combined forecasts from two unbiased forecasting models are given as

Variance-Covariance (VACO). This method was proposed by Bates and Grager
where f c is the combined forecast based on the individual forecasts of f 1t and f 2t , w and (1 − w) are the weights assigned to f 1t and f 2t , respectively. The weight that minimizes the combined forecast variance is: where s 2 11 and s 2 22 are the unconditional individual forecast error variance and σ 12 is the covariance. In practical, Bates and Granger [20] suggested Eq (4) to combine the forecasts Where e 1t and e 2t are individual forecast errors, and T is the sample size. For more than two individual forecasts the weights can be calculated, according to Fritz et al. [27] by

AKAIKE weights.
In this method, Akaike's Information Criterion (AIC) is computed for each model, (Burnham and Anderson [28]; Acquah [29]; Hsiao and Wan [30]; Pi latowska [31]) and the weights can be calculated as follows Where AIC min is the minimum of the N different AIC i values.

AC. Altavilla and Ciccarelli [32]
Proposed a new methodology that modifies (Aggregated Forecast Through Exponential Reweighting (AFTER) proposed by Yang [33], for more detail, refer to [33,34]). the weight attributed to a certain model at time τ is larger the larger its ability to forecast the actual value not in all previous periods, but only at τ − 1.
The weights are assigned as follows: Where: S 2 t the sample variance of dependent variable, S 2 t ¼ ðt À 1Þ À 1 P tÀ 1 s¼1 ðy s À m t Þ 2 , and m t ¼ t À 1 P tÀ 1 s¼1 y s

Data
The hierarchical structure of the dataset was illustrated in Fig 1, and the number of series at each hierarchical level was presented in Table 1. Table 1 concludes that the hierarchy consists of three levels, the top-level (level 0) represents the international total trade, the middle level (level 1) total exports, and total imports separately, and the bottom level (level 2) shows the way disaggregated. Fig 2 shows the characteristics of the disaggregated series. Thus, the hierarchy includes 7-time series, each containing 70 monthly observations from January 2015 to October 2020 as an estimation period and the data from November 2020 to April 2021 as a testing period, the data were obtained from the Central Agency for Public Mobilization and Statistics (EGYPT). https://www.capmas.gov.eg/Pages/ Publications.aspx?page_id=5107&Year=23614.
According to Fig 2, non-Petroleum Imports constitute a larger percentage of the total imports, also non-Petroleum Exports constitute a larger percentage of the total Exports. Generally, there is a Fluctuation as shown in Fig 2.

The results
This section provides the empirical results obtained, starting with presenting the descriptive statistics of the data, then comparing the performance between hierarchical approaches, and finally comparing the performance of the individual forecasts with the combined forecasts.

Descriptive statistics
The descriptive statistics of the series are presented in Table 2. Which indicates that the mean of the BA (Non-Petroleum Imports) is the highest, the minimum is 121 for Export AB (Crude Oil & its Products), and the data is skewed to the left as indicated by the skewness value except AB and BB (Crude Oil & its Products). Kurtosis values indicated that most of the data are leptokurtic.

Results from individual models
Mean Absolute Percentage Error (MAPE), and Root Means Error (RMSE) were used as the most popular forecast error measures. MAPE is a relative measure of performance, which specifically was used frequently in the literature studies of hierarchical time series. The MAPE was used in a comparison with the RMSE measure: Whereŷ t &y t are the estimated and actual values respectively, n is the number of data (Silveira and Azevedo [6]). Tables 3-6 contain the values of the MAPE and RMSE for each forecast horizon yielded by bottom-up, top-down, and the optimal combination approach as described in section 2.1, and   the last column contains the average across all forecast horizons for ETS and ARIMA models using R software. Tables 3-6 concluded that the error percentage produced by the ARIMA model was less than that produced by the ETS model at all levels, and it is clear that the Minimum Trace Sample estimator and the bottom-up approach with ARIMA models have good predictive performance, even with the increased in the forecast horizon. The MinT(shrink) approach is better than optimal combination approaches such as OLS and WLS under the ARIMA model. Moreover, the results obtained from the top-down approach did not present good predictive results.

Results from combining forecasts
To investigate empirically whether combining the forecasts generated from different models can improve forecasting accuracy rather than using individual models is the main focus of this  Tables 7 and 8).
The results of this study showed that the (AC) method works better than other combination methods for forecasting the value of imports and exports, and it is more accurate than the best individual forecasts at level zero (international total trade in Egypt) and level one (total exports and total imports), which leads to providing forecasting values that can be economically relied upon in planning international trade.

Conclusion
This study used different hierarchical forecasting approaches to obtain the best models and forecasts for international trade in Egypt. ARIMA and Exponential Smoothing (ETS) were used as forecasting models for all levels in the hierarchy. The results concluded that the error rate generated by the ARIMA model was lower than that generated by the ETS model. The minimum trace sample estimator and the bottom-up approaches with ARIMA models have good forecasting performance compared to the other approaches. Furthermore, the forecasts from the Minimum Trace sample estimator and bottom-up approaches with ARIMA models were combined as the best individual approaches using five different combining methods. The results indicated that the (AC) method performs better than other combination methods for forecasting the value of imports and exports, and it is more accurate than the best individual forecasting approaches at level zero (international total trade in Egypt) and level one (total exports and total imports). So, the (AC) method leads to providing forecasting values more accurate than can be economically relied upon in improving the trade balance and drawing up a more efficient production policy. Therefore, the combined forecasts provide a new insight into modeling international trade in Egypt that benefits the government, exporters, and importers. The research recommends using hierarchical forecasting methods in the areas of international trade volume because they produce acceptable forecasts.